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DETAILED ACTION 

1 . This Office Action is in response to applicant's Request for Continued Examination 
(RCE) dated January 18, 2005 in response to Office Action dated July 14, 2004. 

2. Regarding applicant's argument with respect to claim 1/'that the assignment 
manager AM1 is different than the interleaver because the interleaver interleaves 
instructions from different programs whereas AM1 assigns different tasks from an individual 
programs to be processed in multiple processors", Blelloch reference discloses sequential 
programs, implying more than one programs intended for use with a single processor that 
designates each task and selects a subset of the available tasks for parallel processing (See 
col. 3, lines 10-50). 

3. Regarding applicant's argument with respect to In response to applicant's argument 
that the references fail to show certain features of applicant's invention, it is noted that the 
features upon which applicant relies (i.e., Blelloch reference that it does not mention 
processor interruption when the processor is waiting for needed information and then 
supplying that processor with another task to process so that the processor does not remain 
idle, applicant's claimed limitation is not recited in any of the claims i.e., the processor 
waiting... being provided with information... so as not to remain idle) are not recited in the 
rejected claim(s). Although the claims are interpreted in light of the specification, 
limitations from the specification are not read into the claims. See In re Van Geuns, 988 
F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). 
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4. Regarding applicant's argument with respect to claim 25 that "a target program 
counter coupled to a plurality of program counters is not disclosed by the combination", 
applicant's attention is drawn to Akkary et al. col. 5, lines 20-3 3... thread management logic 
124 also ends threads by stopping the associated program counter...; and therefore any 
program counters can be a target program counter based on which thread management 
logic 124 bases its thread to end on. 

5. Regarding applicant's argument with respect to claim 33 that "...wherein each of 
said instructions is issued to said one or more units in each cycle...", Akkary reference 
deals with thread management logic that created different threads from a program or 
process in l-cache and it would have been obvious to a person of ordinary skill in the art to 
use this capability similar to the instant claim limitation. 

6. Regarding applicant's argument with respect to claims 33, the step of "no no-op is 
inserted into the pipeline for the purpose of ensuring that said next instruction is not 
provided to said pipeline until said previous instruction has completed", applicant's 
attention is drawn to Naini et al. (col. 2, lines 1-5) wherein "...processor will not issue a 
next.. .instruction. ..until the previously issued. ..instruction has cleared..." is similar to the 
claim limitation. 

Claim Rejections - 35 (JSC § 103 

7. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set forth 
in section 102 of this title, if the differences between the subject matter sought to be patented and the 
prior art are such that the subject matter as a whole would have been obvious at the time the invention 
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was made to a person having ordinary skill in the art to which said subject matter pertains. Patentability 
shall not be negatived by the manner in which the invention was made. 

8. Claims 1, 2, 4-7, 9-1 1 and 24 are rejected under 35 U.S.C. 103(a) as being 

unpatentable over U.S. Patent No. 5,768,594 to Blelloch et al. in view of U.S. Patent No. 

5,710,912 to Schlansker et al., and further in view of U.S. Patent No. 6,161,173 to Krishna 

et al., and further in view of U.S. Patent No. 6, 493,820 B2 to Akkary et al. 

a. Regarding claim 1, Blelloch et al. discloses a programmable processor 

(preprocessor PP1, Figure 1) for executing a plurality of programs (col. 2, lines 14- 

46), said programmable processor (preprocessor 51) comprising: an execution 

pipeline (...an assignment manager.. .determines tasks available for scheduling.. .to a 

system SY1 containing processing elements. ..col. 2, lines 28-37); an interleaver 

(assignment manager AM1, Figure 1) for interleaving instructions (...sequential 

programs, implying more than one programs intended for use with a single 

processor that designates each task and selects a subset of the available tasks for 

parallel processing (See col. 3, lines 10-50)). Blelloch et al. discloses selecting a 

number of tasks greater than a total number of available processing elements from 

all available tasks and partitioning the selected tasks into a number of groups equal 

to the available number of parallel processing elements (col. 1, lines 35-45). 

Blelloch et al. does not disclose execution pipeline having an average pipeline 

latency of one instruction per cycle. Schlansker et al. discloses pipeline processing 

and associated latencies and defines latency as the number of clock cycles between 

the time an input operand is ready for use by a hardware function and the time that 
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a resultant operand from that function is ready for use by a subsequent hardware 
function (col. 1, lines 19-25; col. 2, lines 66-67; col. 1-10). Krishna et al. discloses 
the goal of achieving an average pipeline latency of one clock cycle (...a main 
scheduler schedules execution of operations and allots a single clock cycle. ..even 
though the execution unit is unable to execute some instructions in a single clock 
cycle. ..local. ..circuitry controls execution pipelines having latency of two or more 
clock cycles. ..col. 2, lines 35-67) although it is not always possible to do so. 
Therefore, it would have been obvious to a person of ordinary skill in the art at time 
invention was made to modify Blelloch with the feature of "latency in pipeline 
recognized with the goal of keeping average pipeline latency at one clock cycle" as 
taught by Schlansker-Krishna combination because it results in a more streamlined 
pipeline operation and simplified design (Krishna et al. col. 2, lines 60-67). 
However, Blelloch-Schlansker-Krishna combination does not disclose explicitly the 
issue of plurality of programs in a pipeline setting. Akkary et al. discloses execution 
of a plurality of programs (...thread management logic 124 creates different threads 
from a program or process... col. 5, lines 24-67; col. 6, lines 1-4), comprising: an 
execution pipeline (execution pipeline 108) for interleaving instructions (...a thread 
includes the trace.. .a trace is a. ..instruction. ..col. 5, lines 20-25) from said plurality 
of programs (...threads are either from completely independent programs or are 
from the same program... col. 1, lines 63-65) and providing said instructions (...a 
thread includes the trace. ..a trace is a. ..instruction. ..col. 5, lines 20-25) to said 
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pipeline (execution pipeline 108) for execution. Therefore, it would have been 
obvious to a person of ordinary skill in the art at the time invention was made to 
modify Blelloch-Schlansker-Krishna combination with the "explicit pipelined 
structure" as taught by Akkary et al. because it provides for an ability to 
concurrently execute different threads efficiently (col. 2, lines 3-6). 

b. Regarding claim 2, Blelloch et al. discloses wherein said pipeline has a 
datapath with a depth equal to said number of programs (col. 1, lines 35-45). 

c. Regarding claim 4, Blelloch-Schlansker-Krishna combination as modified 
byAkkary et al. discloses wherein each program of said plurality of programs is 
independent of the other of said plurality of programs (...threads... these processors 
process and execute are independent of each other.. .col. 1, lines 58-64). 

d. Regarding claim 5, Blelloch-Schlansker-Krishna combination as modified 
byAkkary et al. discloses including an output buffer (ROB 164 and MOB 1 78) for 
storing out of order data output (...the result of an execution and related 
information... written to... re-order buffer (ROB) 164. ..col. 7, lines 36-50). 

e. Regarding claims 6 and 7, Blelloch-Schlansker-Krishna combination as 
modified by Akkary et al. discloses including one or more of a register copy (thread 
management logic 124), program counter (program counters 1 12A,...1 12X...col. 5, 
lines 25-30), and program counter stack (thread management logic 124) provided 
for each of said plurality of programs, and further discloses wherein one or more of 
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control and computing resources, instructions, instruction memory, data paths, data 
memory, and caches are shared by said plurality of programs (Figure 1 and 2). 

f. Regarding claims 9 and 10, Blelloch-Schlansker-Krishna combination as 
modified by Akkary et al. discloses wherein said instructions comprise load 
instructions for loading data from a data memory (load buffers 182, Figure 3), and 
store instructions for storing data in a memory (store buffers 184, Figure 3) and 
wherein said data memory (MOB 1 78) comprises a cache (data cache 1 76). 

g. Regarding claim 1 1, it would have been obvious to a person of ordinary skill 
in the art at the time invention was made to have data memory comprising a cache 
because it provides for faster execution of programs in a processor system. 

h. Regarding claim 24, it is similar in scope to claim 1 above and is rejected 
under the same rationale. 

9. Claim 8 is rejected under 35 U.S.C. 103(a) as being unpatentable over U.S. Patent 
No. 5,768,594 to Blelloch et al. in view of U.S. Patent No. 5,710,91 2 to Schlansker et al., 
and further in view of U.S. Patent No. 6,161,173 to Krishna et al., and further in view of 
U.S. Patent No. 6, 493,820 B2 to Akkary et al. as applied to claim 1 above, and further in 
view of U.S. Patent No. 5,961,628 to Nguyen et al. 

a. Regarding claim 8, Blelloch-Schlansker-Krishna combination implicitly 
disclose SIMD execution of vector instructions without addressing vector lengths. 
Nguyen et al. explicitly discloses wherein said processor executes SIMD vector 
instructions of vector length N and executes in parallel a plurality of instructions 
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having SIMD vector lengths that sum up to N (col. 1, lines 1 1-24; col. 53-60). 
Therefore, it would have been obvious to one of ordinary skill in the art at the time 
invention was made to modify the device as taught by Blelloch-Schlansker-Krishna 
combination with the feature "SIMD vector instructions execution of vector length L 
and plurality of instructions having SIMD vector lengths summing up to NT as taught 
by Nguyen et al. because it provides a way to reduce processing time for repetitive 
task (col. 1, lines 10-25). 
10. Claims 12 and 13 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
U.S. Patent No. 5,768,594 to Blelloch et al. in view of U.S. Patent No. 5,710,912 to 
Schlansker et al., and further in view of U.S. Patent No. 6,161,1 73 to Krishna et al., and 
further in view of U.S. Patent No. 6, 493,820 B2 to Akkary et al. as applied to claim 1 
above, and further in view of U.S. Patent No. 5,973,705 to Narayanaswami. 

a. Regarding claims 12 and 13, Blelloch. ..Akkary combination does not 
disclose a graphics processor wherein address space of said data memory comprises 
a frame buffer unit and a texture memory unit as it describes a vector processor in 
general with possible suggestion of its use in multimedia processing (col. 1, lines 
10-25). Narayanaswami discloses explicitly a SIMD graphics processing system 
comprising a frame buffer unit (frame buffer 1 10f, Fig. 2A) while implicitly 
suggesting a texture memory unit. Therefore, it would have been obvious to one of 
ordinary skill in the art at the time invention was made to modify the device as 
taught by Blelloch-Akkary combination with the feature "frame buffer and texture 
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memory unit' 7 as taught by Narayanaswami because it provides a way to reduce 
processing time (col. 2, lines 20-22). 
11. Claims 3, 14-18 and 23 are rejected under 35 U.S.C. 103(a) as being unpatentable 
over U.S. Patent No. 5,768,594 to Blelloch et al. in view of U.S. Patent No. 5,710,912 to 
Schlansker et al., and further in view of U.S. Patent No. 6,161,1 73 to Krishna et al., and 
further in view of U.S. Patent No. 6, 493,820 B2 to Akkary et al. as applied to claim 1 
above, and further in view of U.S. Patent No. 6,209,083 B1 to Naini et al. 

a. Regarding claim 3, Blelloch-Schlansker-Krishna-Akkary combination does 
not disclose wherein a next instruction from one of said plurality of programs 
(...threads are either from completely independent programs or are from the same 
program... col. 1, lines 63-65) is not provided to said pipeline (execution pipeline 
108) until a previous instruction of said one of said plurality of programs (...threads 
are either from completely independent programs or are from the same 
program... col. 1, lines 63-65) has completed. Naini et al. discloses working in the 
same respect as the claim limitation "... processor will not issue a 
next.. .instruction. ..until the previously issued... instruction has cleared... col. 2, lines 
1-5". Naini et al. further indicates that the previous instruction will not have an 
exception (col. 2, lines 1-5). The application specification is clear in detailing 
avoiding the hardware complexity of pipeline bypasses, instruction reordering or 
the inefficiencies of idle cycles (page 11, 1 st paragraph) in much the same fashion. 
Therefore, it would have been obvious to one of ordinary skill in the art at the time 
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invention was made to modify the device as taught by Blelloch-Schlansker-Krishna- 
Akkary combination with the feature "no next instruction into the pipeline until the 
previous instruction has completed or retired from the pipeline" as taught by Naini 
et al. because it provides a way to reduce pipeline stalling, or the need for pipeline 
bypass, instruction reordering or idle cycles in the pipeline (col. 1, lines 59-60). 

b. Regarding claim 14, it is similar in scope to claim 3 above and is rejected 
under the same rationale. 

c. Regarding claims 15 and 16, they are similar in scope to claim 6 above and 
are rejected under the same rationale. 

d. Regarding claim 1 7, it is similar in scope to claim 2 above and is rejected 
under the same rationale. 

e. Regarding claim 18, it is similar in scope to claim 8 above and is rejected 
under the same rationale. 

f. Regarding claim 23, it is similar in scope to claim 3 above and is rejected 
under the same rationale. 

12. Claim(s) 25-33 are rejected under 35 U.S.C. 103(a) as being unpatentable over U.S. 

Patent No. 6, 493,820 B2 to Akkary et al. 

a. Regarding claim 25, Akkary et al. discloses plurality of program counters 
(...program counters 112A, 1 12B,....,1 12X...col. 5, lines 24-33); each of said 
plurality of counters coupled to an instruction memory (l-cache 104); instructions 
from said instruction memory (l-cache 104) coupled to an instruction decode 
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(decoder 106); said decode (decoder 106) coupled to a plurality of registers 
(...instructions from MUX 110 are received. ..in register file 152. ..col. 7, lines 5-25); 
the said plurality of registers coupled to an operand route (...depending on the 
instructions, operands may be provided from register file 152 through conductors 
168. ..col. 7, lines 18-25); said operand route coupled to an arithmetic datapath 
(execution units 158); said datapath (value writeback 196 and 122) and an output 
of a data memory (data cache 114) coupled to a result route (trace buffers 114); and 
an output of said result fed back to each of said plurality of registers (...each trace 
buffer has an output register file that holds the register context of the associated 
thread and an input register file to receive the register context... col. 13, lines 58-67; 
col. 14, lines 1-43... register contexts are passed between output register files and 
input register files over conductors 21 6... col. 15, lines 9-25). 

b. Regarding claim 26, Akkary et al. discloses said plurality of program counters 
is equal to said plurality of programs to be interleaved (...thread management logic 
124 creates. ..by providing. ..program counters 1 12A...col. 5, lines 24-33). 

c. Regarding claim 27, Akkary et al. discloses said plurality of registers is equal 
to said plurality of programs to be interleaved (...allocation involves assigning 
registers to the instructions and assigning entries of the reservations stations of 
schedule/issue unit.. .col. 7, lines 10-30). 

d. Regarding claim 28, Akkary et al. discloses trace buffers may be the same as 
or different than the number of program counters and that these buffers may be 
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single memory divided into individual trace buffers or physically separate trace 
buffers or some combination of the two; and each program counter is associated 
with a particular thread ID and trace buffer and also that there is not such a 
restricted relationship (col. 8, lines 5-15); and dependency generation and decoding 
circuitry 21 8A could include multiple dependency fields and registers (col. 15, lines 
44-58). Therefore, it would have been obvious to a person of ordinary skill in the 
art at the time invention was made to have said plurality of registers be more than 
said plurality of programs to be interleaved because it helps in increased 
throughput. 

e. Regarding claims 29, 30 and 31, the concept of more resources available 
than required would result in increased throughput and double-buffering is a well- 
known scheme to avoid waiting for resources to carry data processing in an efficient 
manner. This is similar in scope to claim 28 above and claims 29 and 30 rejected 
under the same rationale. 

f. Regarding claim 32, it is similar in scope to claim 7 above and is rejected 
under the same rationale. 

g. Regarding claim 33, it is similar in scope to claim 23 above and is rejected 
under the same rationale. 
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Conclusion 

1 3. The prior art made of record and not relied upon is considered pertinent to 
applicant's disclosure. The following prior art teach SIMD processing and execution of 
pipelines in superscalar processors. 

U.S. Patent No. 6,470,445 B1 to Arnold et al. U.S. Patent No. 5,420,990 to McKeen et al. 
U.S. Patent No. 6,064,818 to Brown et al. U.S. Patent No. 5,428,807 to McKeen et al. 

U.S. Patent No. 6,282,635 to Sachs U.S. Patent No. 5,802,386 to Kahle et al. 

U.S. Patent No. 5,949,996 to Atsushi U.S. Patent No. 6,209,078 to Chiang et al. 

U.S. Patent No. 5,548,737 to Edrington et al. U.S. Patent No. 6,412,061 to Dye 
U.S. Patent No. 5,809,552 to Kuroiwa et al. U.S. Patent No. 6,507,862 to Joy et al. 

In particular, U.S. Patent No. 6,507,862 to Joy et al. discloses vertical and horizontal 
threaded processors. Joy et al. discloses a single pipeline shared among a plurality of machine 
states or threads, a thread that is currently active, not stalled, is selected and supplied data or 
functional blocks connected to the pipeline; when active thread is stalled, the pipeline 
immediately switches to a non-stalled thread, and begins executing the non-stalled thread (See 
col. 6, lines 10-40; col. 8, lines 15-60). 

14. Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Dalip K. Singh whose telephone number is (571) 272- 
7792. The examiner can normally be reached on Mon-Fri (8:00AM-6: 30PM). 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Matthew Bella, can be reached at (571) 272-7778. 
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Any response to this action should be mailed to: 



Commissioner of Patents and Trademarks 



Washington, D.C. 20231 



or faxed to: 

(703) 872-9314 (for Technology Center 2600 only) 

Hand-delivered responses should be brought to Crystal Park II, 2121 Crystal Drive, 
Arlington, VA, Sixth Floor (Receptionist). 

Any inquiry of a general nature or relating to the status of this application or 
proceeding should be directed to the Technology Center 2600 Customer Service Office 
whose telephone number is (703) 306-0377. 
dks 



April 14, 2005 




MATTHEW C. BELLA 
SUPERVISORY PATENT EXAMINER 
TECHNOLOGY CENTER 2600 



